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Abstract: Accurate maintenance effort and cost estimation are essential for effective 
software development. By identifying software modules with poor maintainability, 
Software Maintainability Prediction (SMP) plays a crucial role in managing software 
maintenance expenses. Previous research efforts have used multiple regression 
techniques to predict software maintainability, but the results regarding various 
accuracy and performance metrics are inconclusive. As such, developing a methodology 
that can recommend regression techniques for software maintainability prediction in the 
face of inconsistent performance or accuracy metrics is imperative. This research 
addresses the critical issue of software maintainability and presents a novel approach, 
the Software Maintainability Model (SMP) utilizing the Predictor Importance (PI) 
Method, Multiple Linear Regression (MLR), and five machine learning techniques. The 
proposed SMP integrates ten static source code metrics from object-oriented 
programming. MLR and PI implement feature selection, and the SMP's performance is 
evaluated based on accuracy and the Mean Magnitude of Relative Error (MMRE) 
parameters. Our findings are promising: for the User Interface Management System 
(UIMS) software, the proposed SMP demonstrates an impressive MMRE of 0.2441 and 
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an accuracy of 91.91%. Similarly, for the Quality Evaluation System (QUES) software, 
an MMRE value of 0.2222 is achieved alongside a maximum accuracy of 80.95%. The 
ensemble method, when compared to other Machine Learning (ML) techniques, 
exhibits superior performance. These results affirm the effectiveness of our approach, 
contributing to the enhancement of software maintainability in object-oriented 


programming systems. 


Introduction 

The qualities of object-oriented (OO) software are 
essential for satisfying explicit and implicit requirements 
(Colakoglu et al., 2021). High-quality software is always 
a priority because of its extensive effects on many facets 
of life and the economy. Software maintenance is the 
most expensive stage of development, accounting for 
roughly 75% of total costs. Stressing the importance of 
investing in maintenance activities is essential because 
software bugs jeopardize functionality. According to a 
report from the Consortium for Information and Software 
Quality (Garomssa et al., 2022), only in the US, poor- 
coded software caused a staggering 2.08 trillion dollars in 


losses in the year 2020 and 4.4 billion people were 
impacted by software flaws in 2016 alone, resulting in a 
1.1 trillion-dollar loss to the global economy. The 
ISO/IEC 9126 standard has 
functionality, maintainability, portability, reliability, and 


identified efficiency, 


usability as characteristics of high-quality software (Jung 
et al., 2004). Maintainability has recently gained 
significant attention as a crucial quality indicator for 
software systems' Software maintainability 


refers to a system's or component's ability to fix flaws, 


SUCCESS. 


enhance performance, or adapt to environmental changes. 
Looking more at software maintenance may affect the 
overall cost of software development. As a solution, 
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many researchers have provided various SMP models 
(Ahmed and Al-Jamimi, 2014; Al Dallal, 2013; Wang et 
al., 2019; Zhang et al., 2015). SMP forecasts the costlier 
classes before the maintenance phase; hence, the cost can 
be minimized after looking at these classes more. The 
SMP model needs independent and dependent metric 
values based on historical data of different software 
versions. Figure | illustrates the standard procedures for 
estimating the maintainability of any OO software. The 
Unified Modelling Language (UML) class diagram is 
used to identify the software's essential classes, and the 
CKJM tool (Chidember and Kemerer, 1994) calculates 
various OO metrics for each class. Then, using feature 
selection techniques, the right feature sets are chosen. 
Additionally, the developed SMP model uses these metric 
collections as inputs to predict maintainability for each 
distinct class of software. In the literature, various OO 
metrics (Mahfuz and Shill, 2023; Haner and Ercelebi, 
2023; Ouellet and Badri, 2023) are used to develop the 
SMP model as predictor variables, whereas change metric 
is used as a response variable (Malhotra and Chug, 2014; 
Eish et al., 2015; Kumar and Rath, 2016; Kumar et al., 
2017). For training of SMP models, various machine 
learning and statistical techniques are provided in the 
literature, such as Association rule mining, Bayesian 
networks, Clustering, Neural networks, Regression-based 
models, and Support Vector Machines, which are some of 
the widely used ML methods for maintainability 
prediction (Zhang et al., 2015; Kumar et al., 2017; 
Malhotra and Lata, 2021). 
The SMP model's effectiveness depends on choosing 
the right metrics for object-oriented source code. The 
feature selection process entails selecting a suitable 
subset from various object-oriented programming metrics 
offered (Kumar and Rath, 2017; Moradi et al., 2022). 
Also, it still needs to be determined to develop an 
accurate SMP model that can predict the maintainability 
of a class. Keeping these facts, the following goals have 
been established for this study: 
eTo determine the significant relationship between OO 
Metrics and change metrics. 

e Selection of an appropriate collection of OO metrics for 
SMP creation. 

e Creation and Comparison of the proposed SMP model 
with existing SMPs. 


To achieve the objectives mentioned above, two 
commercial OO software are collected, and OO metrics 
and change metrics are extracted to build the data set. 
Further, MLR is applied to investigate the impact of each 
OO metric on the Change metric based on hypothesis 
testing. Additionally, PI is also used for feature selection. 
Three cases are created, and Five different ML 
techniques are applied to each case to create an SMP 
model. The performance of each SMP model is then 
evaluated using parameters such as accuracy and MMRE. 


Related Work 

This section gives a summary of the body of research 
that has been done on the use of software metrics and 
their applicability to SMP, as shown in Table 1. From 
Table 1, it can be understood that for SMP, the 
maintainability can be determined by the Change metric 
and used in various studies (Li and Henri, 1993; Koten 
and Gray, 2006; Zhou and Leung, 2007, 2013; Al- 
Jamimi, 2013) whereas maintainability as maintainability 
index (MI) are utilized by Zhou and Xu (2008), Coleman 
et al. (1994), and Zhang et al. (2015). 

As Table 1 illustrates, the majority of the research that 
has already been done is focused on certain methods and 
performance indicators. The study of predictor 
significance and method integration are noticeably 
underemphasized. For instance, even though some 
researchers have used various methods, such as Support 
Vector Machines and Bayesian networks, none have 
specifically addressed the idea of predictor relevance. 
Moreover, the novel method that combines Multiple 
Object-Oriented Metrics, 
Predictor Importance is still unexplored. 


Linear Regression, and 


Research Methodology 
In order to create an effective SMP model, the current 

study investigates how OO metrics and the change metric 
relate to one another. The study focuses on three cases for 
feature selection and training using five machine learning 
techniques. The current study involves the following 
steps: 

1. Dataset Formation: The initial dataset comprises ten 
OO metrics, considered predictor variables or input 
variables, and the change metric, the response variable 
representing software . can These metrics 


> Identification > si | 
oe ae Prediction 
— 


a 1. Flow chart for SMP model. 
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Table 1. An overview of the empirical research on maintainability 


Year | Author Maintainability Techniques | Performance measure 
1993 Li & Henry CHANGE Regression R’ and Adjusted R? 
Analysis 
1994 Coleman et al. Maintainability = 171 Regression - 
-5.2 x In(aveVol) Analysis 
-0.23 x ave V(g') 
-16.2 x In(aveLOC) 
+(50 x sin (d2.46 x 
perCM)) 
2006 C. van Koten, A.R. CHANGE Bayesian network MMRE 
Gray 
2007 Zhou & Leung, CHANGE multivariate MMRE 
adaptive regression 
splines 
2008 Zhou & Xu Maintainability Index Univariate and Absolute relative error, 
(MI)= 171-5.2 Multivariate magnitude of relative 
In(aveV) — Regression error, and R squared 
0.23aveV(g') Analysis 
— 16.2 In (aveLOC) + 
50 sin (sqrt 
(2.4perCM)) 
2013 Al Dallal, CHANGE Multivariate Precision, Recall, 
logistic regression Inverse precision 
analysis (denoted IP), Inverse 
Recall (denoted IR) 
2013 Ahmed & Al- CHANGE Mamdani fuzzy NRMSE, MMRE 
Jamimi, inference engine 
2014 Malhotra & Chug, CHANGE 1. Group Method magnitude of relative 
of Data Handling error (MARE) 
2. Feed Forward 3- 
Layer Back 
Propagation 
Network 
3. General 
Regression Neural 
Network 
2015 Elish et al. CHANGE Ensemble MMRE 
methods 
2015 Zhang et al. MI & ME automated tool: Spearman’s Rank 
SMPlearner Correlation Coefficient 
2016 Kumar and Rath CHANGE functional link Standard Error of 
artificial neural Mean, MAE, MMRE 
network (FLANN) 
with genetic 
algorithm (GA), 
particle swarm 
optimization (PSO) 


DOI: https://doi.org/10.52756/ijerr.2023.v36.013 


Int. J. Exp. Res. Rev., Vol. 36: 135-146 (2023) 


and clonal selection 
algorithm (CSA) 
also rough set 
analysis (RSA) and 
Principal 
Component 
Analysis (PCA) 
2017 Kumar et al. CHANGE Support Vector Precision, Recall, 
Machine Specificity, F-Measure, 
AUC 
2019 Wang et.al. CHANGE Fuzzy Network MMRE 
2020 Malhotra & change count (CC) AB, C4.5, Sensitivity, 
Lata BAGG, IRBFNN, Specificity, G-mean, 
KNN, KS, LR, Balance 
MLP-CG, RBFNN 
2021 Malhotra & CM (Class 28 classification g-mean (GM), and 
Lata Maintainability) techniques balance (BL) 
2023 Kumar & Kaur CHANGE Multi Criteria Eight Performance 
Decision Making measures 
with 22 regression 
techniques 
2023 Jaya Bharath et CHANGE Gradient Boost Accuracy 
al. Classifier 
2023 Hu et al. Static Code Deep Neural Accuracy 
Network (DeepM) 


are collected from software projects to create the 
foundation for the research analysis. 

2. Preprocessing: Before conducting the analyses, the 
dataset undergoes preprocessing 
outlier removal, and normalization). 


(missing data, 


3. Feature Selection and Case Formation: Three distinct 
cases are formed for feature selection, each involving 
different methods for selecting relevant features to 
build predictive models: 

Case 1: As a baseline, in this case, all ten OO metrics are 

used as features for both training and testing the software 

maintainability prediction model. 

Case 2: MLR is employed as a method for hypothesis 

testing and feature selection. 

Case 3: PI analysis is utilized for feature selection. 

Training and Testing: For each of the three feature 

different 

techniques i.e., Ensemble (ENS), Classification Tree 

(CT), Naive Bayes (NB), Discriminant Analysis (DA), 

and Support Vector Machine (SVM) are employed for 

training and testing the predictive model their brief 
description are as follows: 


selection cases, five machine learning 


e ENS: A prediction model made up of a weighted 
mixture of many classification models is called a 
classification | ensemble. Combining several 

classification models often improves 


accuracy (Elish et al. et al., 2016). 
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prediction 


e CT: also known as decision trees, are used to forecast 
data responses. Following the choices made in the tree 
through the root (starting) node up to a leaf node to 
anticipate a response. The answer is stored in the leaf 
node. "True' or 'false' are examples of nominal replies 
provided by classification trees (Alsolai et al., 2020). 

e NB: It refers to a group of classification techniques 
based on Bayes' Theorem. It's not just one algorithm; 
rather, it's a collection of algorithms bound together 
by a common premise: the notion that every pair of 
characteristics being classed stands alone (Kaur et al., 
2014). 

e DA: It makes the assumption that various groups use 
different Gaussian distributions to construct their data. 
The classifier computes the Gaussian distribution 
parameters for every group during training. It chooses 
the class with the lowest misclassification expense in 
order to forecast fresh data (Yenduri and Gadekallu, 
2023). 

e SVM: It finds the optimum hyperplane in an N- 
dimensional space to efficiently split data points into 
different classes (Gupta and Chug, 2020). 

1. Performance Evaluation: comparative analysis using 
MMRE and accuracy. 

The accuracy of the SMP model is calculated using a 
confusion matrix. It has four aspects: true positive (TP), 


true negative (TN), false positive (FP), and false neg 
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(FN). Consider that positive means a high-maintenance 
class and negative means a low-maintenance class, in 
such situation TP: The predicted value is positive, and the 
actual value is also positive. FP: The predicted value is 
positive, but the actual value is negative. FN: The 
predicted value is negative, but the actual value is 
positive. TN: The predicted value is negative, and the 
actual value is also negative. Flowchart for the proposed 
work is presented in Figure 2. 


Predictor ~, 
importance __-~ 


Development of SMP model 


m DA 
a 
= ENS 
| CT P 
sy 
—— | E 


erformance 
SVM ‘valuation 
* MMRE 


Machine Learning 


Testing Set = 30% NB Accuracy 


Figure 2. Flowchart of the proposed SMP model. 


Experimental Setup 

The SMP's creation, considering various case studies, 
is the primary subject of the present section. The data is 
normalized to increase accuracy, and both dependent and 
independent variables are chosen to create the SMP 
model. 
Dataset Description 

The dataset is obtained using two of the most popular 
commercial software systems, i.e, UIMS and QUES, 
which were developed using Classic-Ada programming 
language, hence providing the OO paradigm. UIMS 
contains ten OO metrics and a Change metric with 39 
instances, whereas QUES contains 10 OO metrics and a 
Change metric with 74 instances. The dataset was 
proposed by Li and Henry (Li and Henry,1 993). 
Dependent variable: Change metric 
Authors have developed different definitions and metrics 
for software maintainability (Li and Henry, 1993; Zhou 
and Leung, 2007). According to the existing literature, 


most researchers use the "MI" and "change metric" to 
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evaluate the maintainability of software. In the current 
study, the to be 
maintainability. the amount of 


change metric is considered 
Maintainability is 
modification added to the code throughout maintenance. 
A line change is defined as either the "addition" or 
"deletion" of lines of code within a class during the 
maintenance phase (Malhotra and Chug, 2014; Li and 


Henry, 1993; Zhou and Leung, 2007). 


Predictor/Independent Variable: OO metric 

The current study focuses on ten static source code 
metrics within the OO paradigm. Selected OO metrics for 
SMP are shown in Table 2. 


Table 2. List of OO Metric used in proposed SMP 
model 


Ferrans CARO RICHicdinay 
Metric suit Metric Object O tented 
property 
DIT Inheritance 
(Chidamb LCOM Cohesion 
ae NOC Inheritance 
and Kemerer, REC Coupli 
1994) a — — 
WMC yc on ic 
complexity 
DAC enemacron and 
coupling 
(Li and Henry MPC Coupling 
1993) NOM Encapsulation 
SIZE1 Elements of Source 
SIZE2 Code 


DIT is the maximum length from the node to the tree's 
root. The cohesion and interdependence of methods in a 
class are assessed using LCOM in OO programming. It 
quantifies how much information or variables are shared 
between methods in a class. Indicating that the methods 
are less related or lack semantic coherence, a higher 
LCOM value suggests weaker cohesion. NOC counts the 
immediate subclasses of a parent class. For example, in 
Java programming, if classes B and C inherited a class 
named A, then the NOC value for class A would be 2. 
RFC is a metric for how many methods in a class can be 
used to respond to a message. It is assumed that the 
higher the value of RFC, the higher the complexity and 
the lower the effort and maintainability of the code. For 
example, if Class A is inherited by Class B and both have 
two methods, then the object of Class B can invoke all 
four methods: therefore, RFC (Class B) =4, whereas RFC 
(Class A) = 2. WMC is the average of all the 
complexities of methods defined in a class. If the WMC 
value of a class is high, it means the class is more 
complex and vice versa. DAC is the count of abstract 
data types defined in a class. DAC generally depicts that 
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a class has too many responsibilities. In other words, 
there are many fields with references. For example, if 
Class A declares two different objects, then the DAC 
(Class A) = 2. The MPC counts how often a class's 
methods refer to methods in other classes, indicating how 
dependent local methods are on methods implemented by 
other classes. It enables analysis of the message 
transmission (method calls) between the involved classes’ 
objects. NOM counts the total number of public methods 
defined in a class. For example, if two methods are public 
in a class A, one is private, and one is protected, then 
NOM (Class A) =2. SIZE1 determines the number of 
semicolons used, and SIZE2 determines the addition of 
several attributes and methods in a class. 
Efficiency of metrics 

In the current study, three cases are defined for 


selecting different sets of OO metrics, as described in 
Table 3. Based on three cases, cl, c2, and c3, SMP is 
constructed, and performance is evaluated. 

Table 3. Test case description. 


Case ee Predictor variables 
variable 

cl Change DIT, WMC, RFC, NOC, DAC, 

metric LCOM, MPC, NOM, SIZE2, 
SIZE1 

c2 Change Reduced feature attributes using 
metric MLR 

c3 Change Reduced feature attributes using 
metric PI 


Feature selection 
A crucial step 
choosing the appropriate software metric set. This study 
uses two different kinds of feature selection methods to 
the OO 
maintainability. These methods assist in selecting the 
relevant group OO metrics from the more extensive 


in predicting maintainability is 


increase predictability of software 


selection of options. 
MLR 

Regression analysis estimates how a dependent 
variable will change as an independent variable or group 
of independent variables changes. In the current study, 
MLR (Riaz et al., 2009) is utilized in order to analyze 
how strong the relationship is between various OO 
metrics and change. MLR can be demonstrated by 


Equation (1). 
M= bO+ b1*X1+ b2*X2+4+53*X3......... (1) 
In Equation (1), M represents the dependent variable, 
X1, X2, and X3 are the independent variables, and b0, 
bl, b2, and b3 are the coefficient values. Four different 


properties of MLR include Coefficient Estimates (CE), 
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Standard error (SE),  t-statistic(t-stat), 
Hypothesis testing and feature selection process may be 


and p-value. 


based on the p-value of the t-statistic. For instance, if the 
p-value of X2 in Equation (1) is more significant than 
0.05, this term is insignificant at the 5% significance 
level given the other terms in the model and vice-versa. 


The workflow of MLR is shown in Figure 3. 
eS 
Dataset 


Identification of Each OO metrics and 
Change Metric 


Discard Metric 


Include Metric 


Figure 3. Feature Selection using MLR. 


PI 

PI is another method used in the current study to 
assess the significance of each predictor of a tree (Xu et 
al., 2019). It involves evaluating the impact of splits for 
each predictor on node risk and summing up these 
changes. The total sum is then divided by the number of 
branch nodes. The difference between the risk of the 
parent node and the combined risk of its two offspring 
nodes signifies the actual change in node risk caused by 
the predictor splits. To illustrate, when a tree divides a 
parent node (let us say node 1) into two child nodes (for 
instance, nodes 2 and 3), the significance of the split 
predictor is enhanced according to equation 2 in the PI 
method. 

Sy = (R, - R2- R3)/Nbranch ................c0 ee (2) 

R; represents the node risk of node i, and Nbranch is 
the overall count of branch nodes. S;is the significance of 
node 1. A node's risk is determined by its error or 
impurity, which is then weighted by the probability 
associated with that particular node. 

Ry Pee B ypicsavessvicivicwoctiauetsicaahiseseeasee (3) 

Here, P; denotes the node probability of node i, and Ej 
corresponds to either the node error (in the case of a tree 
grown using the towing criterion) or node impurity (for a 
tree grown using an impurity criterion like the Gini index 
or deviance) for that specific node i. 
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Performance Evaluation 

Since the SMP belongs to the prediction models, 
Accuracy and MMRE (Elish et al., 2015; Kumar & Rath, 
2016; Wang et al., 2019) are the two parameters used for 
the performance evaluation. The formula of the Accuracy 
measure is defined in Equation (4). MMRE is used in this 
study to compare results with earlier investigations. 
MMRE stands for a mean of a measure called the 
magnitude of relative error. Because of the linearity of 
the mean, any measure that reduces the predicted 
magnitude of relative error (MRE) would also reduce the 
expected MMRE. Equation (5) can be calculated where 
Xi’ represents the predicted outcome, Xi is the actual 
outcome, and n is the total number of observations. 


TP+TN 
Accuracy (ACC) = ——————— 
y( ) TP+FP+TN+FN 


MMRE =~, ae 
Results and Discussion 
This section describes the results obtained by multiple 
regression analysis and hypothesis testing, followed by 
the performance analysis of SMP for each case. Multiple 
linear regression predicts a variable using multiple 
predictors. When used for binary classification, it can be 
adapted by setting a threshold on predicted values for 
categorization. Hypothesis testing assesses each 
predictor's significance and their combined impact on the 
outcome, often determined using methods like p-values 
or F-tests. 
Results of Multiple Linear Regression 

This section presents the results of a regression 
analysis conducted on the UIMS and QUES software 
dataset to investigate the impact of various OO metrics 
on the Change metric as described in Table 4. The 
analysis aimed to discern the impact of different metrics 
on SMP performance and provide insights into the 
significance and direction of feature selection. Based on 
the p-value, the following null and alternate hypotheses 
are considered in the present work. 

Ho: “No significant correlation exists between OO metric 
and Change Metric." 

Hi: “There is a significant correlation between OO metric 
and Change Metric." 

When considering UIMS, Table 4 shows that the 
intercept term in the regression model was estimated to 
be 9.5395 with a standard error of 30.802. The associated 
t-value of 0.3097 resulted in a p-value of 0.75908. 
However, the intercept's p-value suggests that it is not 


statistically significant, indicating that it may not sub- 
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-stantially influence the change metric. The regression 
analysis and NOC metric have a coefficient estimate of 
10.705, accompanied by a standard error of 4.2832. The 
computed t-value of 2.4993 led to a p-value of 0.018582. 
These results suggest that the NOC metric has a 
Statistically significant positive effect on UIMS 
performance, hence rejecting the null hypothesis. The 
LCOM metric had a coefficient estimate of 4.4209, and 
its standard error was 2.3058. The t-value of 1.9173 
resulted in a p-value of 0.065457. This implies that the 
LCOM metric has a marginally significant favorable 
influence on the change metric, hence rejecting the null 
hypothesis. In the WMC metric, the coefficient estimate 
was 4.2741, and the standard error was 2.7144. The t- 
value of 1.5746 corresponded to a p-value of 0.12658. 
This suggests that the WMC metric has a marginally 
significant positive effect on UIMS performance, hence 
rejecting the null hypothesis. Since the p-values for DIT, 
MPC, RFC, DAC, NOM, SIZEI1, and SIZE2 metrics are 
less than 0.05, these metrics do not significantly impact 
change metrics. Therefore, NOC, LCOM, and WMC are 
selected as the subset of SMP features for UIMS. 

When considering QUES, Table 4 shows that the 
intercept term in the regression model was estimated to 
be -3.4033 with a standard error of 16.365. The 
corresponding t-value of -0.20796 resulted in a p-value of 
0.83597. These findings indicate that the intercept term is 
not statistically significant, suggesting that it may not 
significantly contribute to the variation in the QUES 
scores. For LCOM, DAC, and WMC, the p-values are 
0.0085661, 0.00094389, and 0.0032928, respectively. 
This implies that these metrics have a statistically 
significant adverse effect on the Change metric and also 
reject the null hypothesis for these metrics. P-values for 
SIZE1 and SIZE2 metrics are 1.2129e-08 and 0.095063, 
respectively. This reveals that SIZE1 has a highly 
significant positive effect on the Change metric, and 
SIZE2 has a marginally significant positive effect on the 
Change metric, resulting in rejecting the null hypothesis. 
Also, the result of regression analysis on QUES suggests 
that p-values for DIT, MPC, RFC, and NOM metrics are 
more significant than 0.05; therefore, these metrics show 
no impact on the change metric. As far as NOC is 
concerned, since the value of 0 results in a coefficient 
estimate of O and a standard error of 0. Since NOC is 
constant, its t-value and p-value are Not-a-Number. This 
implies that the NOC predictor cannot contribute 
meaningfully to the regression analysis due to its constant 
nature. 


Int. J. Exp. Res. Rev., Vol. 36: 135-146 (2023) 


Table 4. MLR Statistics for UIMS and QUES. 


Metric DMs 
Estimate SE t-value P-value Estimate 
(Intercept) | 9.5395 30.802 | 0.3097 | 0.75908 | -3.4033 16.365 | 0.20796 0.83597 
DIT -3.5758 | 11.095 | -0.32229 | 0.74963 5.9466 7.4352 | 0.79979 0.42699 
NOC 10.705 4.2832 | 2.4993 | 0.018582 0 0 NaN NaN 
MPC 2.7355 6.0698 | 0.45067 | 0.6557 | -0.68361 | 0.67612 | -1.0111 0.31603 
RFC -0.92205 | 2.5565 | -0.36067 | 0.72105 | 0.45772 | 0.32204 | 1.4213 0.16041 
LCOM 4.4209 2.3058 | 1.9173 | 0.065457 | -3.8794 1.4271 | -2.7183 | 0.0085661 
DAC 11.21 15.377 0.729 0.47205 | -13.841 3.9786 | -3.4789 | 0.00094389 
WMC 4.2741 2.7144 | 1.5746 | 0.12658 | -1.4923 | 0.48743 | -3.0615 | 0.0032928 
NOM -2.5711 16.751 | -0.15349 | 0.87911 | -4.6262 3.6341 -1.273 0.20793 
SIZE2 -0.25786 | 15.971 | -0.01615 | 0.98723 6.1088 3.6018 1.696 0.095063 
SIZE1 -0.29626 | 0.38651 | -0.76651 | 0.44979 | 0.36601 | 0.055481 | 6.597 1.2129e-08 


Results of Predictor Importance 

Figure 4 shows the importance of each OO metric for 
predicting the Change metric. The horizontal axis 
represents each OO metric, and the vertical axis 
represents the estimated values. It is observed that RFC, 
NOC, MPC, DAC, NOM, SIZE1, WMC, DIT, SIZE2, 
and LCOM are top-ranked predictor variables in this 
order that have a high impact on the Change of OO 
software system UIMS, whereas in the case of QUES, the 
decreasing order of importance of predictors is SIZE2, 
MPC, WMC, SIZE1, DAC, NOM, LCOM, RFC, and 
DIT. The selected features for all the cases C1, C2, and 
C3 presented in the current study are shown in Table 5. 
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Figure 4 (a & B). PI estimates for SMP. 
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Table 5. Selected Features for SMP in all cases. 


Case UIMS 

Cl DIT, NOC, RFC, 
LCOM, WMC, MPC, 
DAC, NOM, SIZE1, 


QUES 
DIT, NOC, RFC, 
LCOM, WMC, MPC, 
DAC, NOM, SIZE1, 


SIZE2 SIZE2 
C2 NOC, LCOM, WMC LOC, DAC, WMC, 
SIZE1, SIZE2 


C3 DAC, MPC, NOC, 
RFC, SIZE1, SIZE2 
and WMC 


DAC, MPC, NOM, 
WMC, SIZE1, and 
SIZE2 


Comparative Analysis 

This section presents the results of a comprehensive 
evaluation of SMP methods for UIMS and QUES in 
various cases (Cl, C2, and C3). The evaluation is based 
on two performance metrics, i.e, MMRE and Accuracy. 
The analysis aims to identify the most effective SMP 
method for each case. Table 6 shows the results of each 
case Cl, C2, and C3 for UIMS. It is evident from Table 6 
That Case Cl shows that CT, NB, and SVM achieved 
comparable MMRE values of 0.2500. CT and NB also 
exhibited high accuracy rates of 81.82%. Case C2, CT, 
and SVM yielded MMRE values of 0.2441 and high 
accuracy rates of 91.91%. This indicates that CT and 
SVM outperformed other methods, delivering accurate 
and consistent predictions for UIMS analysis in Case C2. 
For Case C3, NB and SVM achieved MMRE values of 
0.2441, along with accuracy rates of 91.91%. These 
results indicate that NB and SVM are well-suited for 
UIMS analysis in Case C3, demonstrating accurate 
predictions and minimal magnitude of relative error. 


Table 6. MMRE and ACCURACY of each utilized 
method for UIMS for each case. 


Met C1 C2 C3 
hod M ACCU M ACCU M_-— ACCU 
MR RACY MR RACY MR_ RACY 
E E E 
CT | 0.25 | 81.82 | 0.24 | 91.91 | 0.24 | 91.91 
00 41 41 
EN | 0.41 | 54.55 | 0.35 | 81.82 | 0.35 | 81.82 
S 67 50 50 
DA | 0.30 | 81.82 | 0.41 | 90.91 | 0.41 | 90.91 
56 67 67 
NB | 0.25 | 81.82 | 0.25 | 81.82 | 0.25 | 81.82 
00 00 00 
SV | 0.30 | 63.64 | 0.24) 91.91 | 0.24 | 91.91 
M 56 41 41 


The evaluation of different classification methods for 
assessing the QUES across distinct cases (C1, C2, and 
C3) is summarized in Table 7. Across all cases, CT and 
ENS methods consistently achieved similar MMRE 
values of 0.2778 and demonstrated an accuracy rate of 
76.19%. Regardless of the specific case, these methods 
showcase stable and reliable performance in QUES 
analysis. DA exhibited varying MMRE and accuracy 
scores across the cases. Notably, in Case C1, DA yielded 
an MMRE of 0.3550 and an accuracy rate of 66.67%. 
However, DA's performance deteriorated in Cases C2 
and C3, resulting in higher MMRE values and lower 
accuracy rates of 0.4861 and 57.14%, respectively. This 
suggests that the effectiveness of DA is contingent on the 
particular QUES context. NB and SVM consistently 
demonstrated moderate to low-performance acrOnB, 
yielding the SVM's highest MMRE values of 0.5139 and 
accuracy rates of 52.38%. On the other hand, SVM's 
performance remained relatively stable, with MMRE 
values ranging from 0.3056 to 0.3194 and accuracy rates 
of 71.43%. 


Table 7. MMRE and ACCURACY of each utilized 
method for QUES for each case. 


CT | 0.27 | 76.19 | 0.27 | 76.19 | 0.33 | 71.43 
78 78 33 

ENS | 0.27 | 76.19 | 0.22 | 80.95 | 0.22 | 76.19 
78 22 22 

DA | 0.35 | 66.67 | 0.48 | 57.14 | 0.48 | 57.38 
50 61 61 

NB | 0.51 | 52.38 | 0.51 | 52.38 | 0.51 | 52.38 
39 39 39 

SV | 0.31 | 71.43 | 0.31 | 71.43 | 0.30 | 71.43 
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The best SMP models for analyzing UIMS and QUES 
are chosen depending on the specific scenario and 
preferred performance criteria. It is evident that ENS and 
DA consistently produce dependable forecasts for both 
UIMS and QUES results. According to the overall trend 
of the findings, these figures unequivocally show that 
ENS consistently outperforms alternative SMP models in 
terms of predictive capability. 

Additionally, for comparative analysis with existing 
studies (Elish and Elish, 2009; Al-Jamimi, 2012; Chanda, 
2012; Aljamaan, 2013; Kumar and Rath, 2016), MMRE 
values are compared. The studies selected used the same 
dataset and performance measure, i.c., MMRE. It was 
found that the combination of CT and PI gives a 0.2441 
value of MMRE for UIMS, which outperforms the other, 
whereas the combination of ENS and PI outperforms the 
others. The Comparison is shown in Figure 5. 
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Figure 5 (a & b). Error Comparison of the proposed 
method with previous studies. 


The research findings shown in Tables 6 and 7 
highlight the potential for the suggested approaches to 
play a significant role in the maintainability of software 
systems in practice. A comparison to previous studies in 
Figure 5 further supports this. Improved accuracy and 
decreased prediction mistakes are key features of these 
approaches for practical use, and decision-making for 
finding the low-maintenance classes and working on 
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them will result in improving the overall cost of software 
development. 

Overall, the computational complexity could be 
influenced by the dataset size, the number of features, 
and the algorithm's intricacy in each methodology step. 
The scalability of these computations could become a 
concern when dealing with large-scale datasets or when 
applying resource-intensive algorithms. 


Conclusion 

The popular UIMS and QUES software datasets were 
thoroughly examined in this study, largely focused on 
improving SMP accuracy and assessing the effects of 
several OO metrics on the Change metric. The current 
research emphasises building an SMP model integrating 
MLR, PI and five ML techniques. MLR and PI are used 
for feature selection in the proposed SMP model. Based 
on MLR analysis on UIMS, NOC displayed a statistically 
meaningful positive effect on UIMS performance among 
the metrics, supported by its coefficient estimate of 10.705 
and a notably low p-value of 0.018582. Similarly, LCOM 
exhibited a subtly significant positive influence, whereas 
WMC demonstrated a marginal yet noteworthy positive 
impact. For QUES, the outcomes underscored that the 
intercept term lacked statistical significance, implying its 
limited contribution to the variability in QUES scores. 
LCOM, DAC, and WMC showcased 
Statistically significant adverse effects on the Change 
metric. In addition, SIZE1 emerged with a highly 


significantly 


significant positive influence, while SIZE2 registered a 
slight but discernible positive impact. In contrast, PI 
analysis underscored the substantial influence of metrics 
such as DAC, MPC, NOC, RFC, SIZE1, SIZE2, and 
WMC for UIMS and DAC, MPC, NOM, WMC, SIZE1, 
and SIZE2 for QUES. The comprehensive evaluation of 
SMP methods for both UIMS and QUES consistently 
revealed the strong performance of CT and ENS methods. 
These outcomes showcased their effectiveness across 
various scenarios, yielding precise predictions and low 
MMRE values. The study's findings demonstrate how 
defining important criteria for QUES and UIMS may offer 
practical guidance to practitioners in selecting the best 
SMP approaches for improved user experience and 
software quality. Additionally, showcasing the synergy 
between MLR, PI, and several ML techniques in the SMP 
model suggests that these methods might be included in 
additional software quality evaluation studies. The study's 
limitations include its focus on specific datasets, which 
may limit generalizability and the fact that statistical 
significance does not guarantee practical impact. Future 
research might examine longitudinal changes in metrics 
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and SMP techniques, as well as corroborate findings using 
a range of software datasets to enhance the SMP model's 
accuracy and usefulness in assessing software quality. In 
summary, this research highlighted influential metrics for 
UIMS and QUES, offering valuable guidance to 
practitioners when selecting SMP methods and conducting 
feature selection to enhance software quality and user 
Further 
implementation of these findings in real-world software 
contexts are highly recommended. 


experience. validation and practical 
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